import plotly.express as pxplt.figure(figsize=(10, 6))# Scatter plot of salary vs. max experience, colored by clustersns.scatterplot( x=features['SALARY'], y=features['MAX_YEARS_EXPERIENCE'], hue=eda.loc[features.index, 'Cluster'], palette='Set2', s=40, edgecolor='white', linewidth=0.5)# Plot centroidscentroids = kmeans.cluster_centers_plt.scatter( centroids[:, 0] * X.std(axis=0)[0] + X.mean(axis=0)[0], centroids[:, 1] * X.std(axis=0)[1] + X.mean(axis=0)[1], marker='X', s=200, c='black', label='Centroids')# Titles and labelsplt.title("KMeans Clustering by Salary and Max Years Experience", fontsize=16)plt.xlabel("Salary", fontsize=12)plt.ylabel("Max Years Experience", fontsize=12)plt.legend(title='Cluster', loc='upper right')plt.grid(True)plt.tight_layout()plt.show()
Code
import plotly.express as pximport plotly.graph_objects as gofrom IPython.display import HTML# 1) Build the DataFramedf_plot = features.copy()df_plot['Cluster'] = eda.loc[features.index, 'Cluster']# 2) Compute centroids in original unitscentroids = kmeans.cluster_centers_centroids_x = centroids[:, 0] * X.std(axis=0)[0] + X.mean(axis=0)[0]centroids_y = centroids[:, 1] * X.std(axis=0)[1] + X.mean(axis=0)[1]# 3) Create an interactive Plotly Figurefig = px.scatter( df_plot, x='SALARY', y='MAX_YEARS_EXPERIENCE', color='Cluster', title="KMeans Clustering by Salary and Max Years Experience", labels={'SALARY': 'Salary','MAX_YEARS_EXPERIENCE': 'Max Years Experience','Cluster': 'Cluster' }, width=800, height=500,)# 4) Add centroid tracesfig.add_trace( go.Scatter( x=centroids_x, y=centroids_y, mode='markers', marker=dict(symbol='x', size=18, color='black', line=dict(width=2, color='white')), name='Centroids' ))# 5) Render the full HTML and embed ithtml = fig.to_html(include_plotlyjs='cdn')HTML(html)
Here we have 4 cluster groups. Group 0, which represent as green have lower salary, mostly under 150k, and max years experience in 2-5 years, it is likely Likely junior to mid-level employees with moderate pay. Group 1 with orange, has medium to high salary, wide range from $100k–$500k and with narrow range ~3 years, they are suggests specialized or high-paying roles with short experience — possibly fast-track promotions or high-demand fields. cluster 2 are low salary and experience from 0-4 years, they are clearly entry level employee. cluster 3 has medium salary, mostly under 200k with higher experiences, like 6-13 eyars. They probably are senior professionals with more experience but not the highest salaries.
plt.figure(figsize =(10,6))plt.scatter(y_test, y_pred, alpha =0.6, color ='skyblue')plt.plot([y_test.min(), y_test.max()], [y_test.min(), y_test.max()], color='red', linestyle='--', lw=2)plt.xlabel("Actual Salary")plt.ylabel("Predicted Salary")plt.title("Actual vs Predicted Salary (Multiple Regression)")plt.grid(True)plt.tight_layout()plt.show()
This plot shows the Actual vs. Predicted Salary using a multiple linear regression model. The blue dots represent individual predictions, and the red dashed line is the ideal line where predicted = actual. Since most points lie very close to the red line, it means your model predicts salary very accurately, with minimal error and strong linear fit — likely reflected in a high R² score near 1.0.